Exercise Type 2: Beta-Bernoulli Coin Toss
What the exam asks: You observe coin tosses. You have a prior belief about the coin's bias. You must compute the likelihood, posterior, evidence, and/or predictive probability.
Part 0: What Do All These Symbols Mean?
New Symbols for This Exercise Type
| Symbol | What It Looks Like | What It Means |
|---|---|---|
| $\mu$ | Greek letter "mu" (looks like a u) | The bias of the coin. $\mu = 0.7$ means 70% chance of "1" |
| $x_n$ | x with subscript n | The n-th coin toss outcome |
| $D$ | Capital D | The dataset — all the toss results we've seen |
| $\text{Beta}(\cdot)$ | "Beta" with parentheses | The Beta distribution — a specific probability formula |
| $\alpha$ | Greek letter "alpha" (looks like a fish) | A parameter of the Beta distribution — "pseudo-count" of ones |
| $\beta$ | Greek letter "beta" | A parameter of the Beta distribution — "pseudo-count" of zeros |
| $\Gamma$ | Capital Greek letter Gamma | The Gamma function — generalizes factorial to non-integers |
| $N_1$ | Capital N with subscript 1 | The number of ones observed in the data |
| $N_0$ | Capital N with subscript 0 | The number of zeros observed in the data |
| $\mathbb{E}[\cdot]$ | E with square brackets | "Expected value" — the average you'd expect |
The Bernoulli Distribution
The Bernoulli distribution describes a single coin toss:
How to read this: "The probability of outcome $x_n$ given coin bias $\mu$ equals $\mu$ raised to the power $x_n$, times $(1-\mu)$ raised to the power $1-x_n$."
The clever trick: This formula "selects" the right probability: - When $x_n = 1$: $p(1|\mu) = \mu^1(1-\mu)^0 = \mu \times 1 = \mu$ - When $x_n = 0$: $p(0|\mu) = \mu^0(1-\mu)^1 = 1 \times (1-\mu) = 1-\mu$
Why write it this fancy way? So we can write the probability of ALL tosses in one compact formula.
The Beta Distribution
The Beta distribution is a formula for our belief about μ:
How to read this: "The probability density of μ given parameters α and β equals a normalization constant (the Gamma fraction) times $\mu^{\alpha-1}(1-\mu)^{\beta-1}$."
Don't panic about the Gamma function. For integers, $\Gamma(n) = (n-1)!$. So: - $\Gamma(5) = 4! = 24$ - $\Gamma(3) = 2! = 2$ - $\Gamma(1) = 0! = 1$
The intuitive meaning of α and β: - $\alpha$ = "number of pseudo-ones" (imaginary tosses that came up 1) - $\beta$ = "number of pseudo-zeros" (imaginary tosses that came up 0) - $\alpha + \beta$ = total pseudo-observations
Example: Beta(μ|3, 2) means "before seeing real data, it's as if I'd already seen 3 ones and 2 zeros."
The Mean of the Beta Distribution
In plain English: "The average value of μ under my Beta belief is α divided by the total."
Example: If α=3, β=2: mean = 3/(3+2) = 3/5 = 0.6
Part 1: The Core Concepts — No Math
What's Going On?
- You have a coin. You don't know its bias μ.
- Before tossing, you have a prior belief: "I think the coin is probably somewhat fair, maybe slightly biased." This is your Beta prior.
- You toss the coin several times and record results. This is your data D.
- After seeing the data, you update your belief about μ. This is your posterior.
The Magic Rule: Conjugacy
When the prior is Beta and the likelihood is Bernoulli, the posterior is ALSO Beta.
And updating is incredibly simple:
In plain English: Just add the real counts to the pseudo-counts. That's it!
Example: - Prior: Beta(3, 2) — as if 3 ones and 2 zeros - Data: {0, 1, 0, 0, 1, 0, 0} — that's 2 ones and 5 zeros - Posterior: Beta(3+2, 2+5) = Beta(5, 7)
The Four Things You Might Be Asked
| What | Symbol | How to Compute |
|---|---|---|
| Likelihood | $p(D \mid \mu)$ | $\mu^{N_1}(1-\mu)^{N_0}$ |
| Posterior | $p(\mu \mid D)$ | $\text{Beta}(\alpha + N_1, \beta + N_0)$ |
| Evidence | $p(D)$ | $B(\alpha+N_1, \beta+N_0) \, / \, B(\alpha, \beta)$ |
| Predictive | $p(x=1 \mid D)$ | $(\alpha+N_1) \, / \, (\alpha+\beta+N)$ |
Part 2: The Key Formulas (MEMORIZE)
Formula 1: Likelihood
IMPORTANT: There is NO binomial coefficient $\binom{N}{k}$ in the likelihood. The likelihood is just the product of individual probabilities.
Formula 2: Posterior
Formula 3: Evidence
Where $N = N_1 + N_0$ (total number of tosses).
Formula 4: Predictive Probability
This is just the posterior mean.
Part 3: FULL Walkthrough of Real Exam Questions
THE EXAM QUESTION (2022, Question 3 — Complete)
Consider a biased coin with outcomes:
Bernoulli: $p(x_n|\mu) = \mu^{x_n}(1-\mu)^{1-x_n}$ Beta prior: $p(\mu) = \text{Beta}(\mu|\alpha=3, \beta=2)$
We throw 7 times and observe: $D = {0, 1, 0, 0, 1, 0, 0}$
Question 3a: Interpretation of α=3, β=2
Which interpretation is most valid? - (a) 5 pseudo tosses, 2 tails and 1 heads - (b) 3 pseudo tosses, 2 tails and 1 heads - (c) P(tails) = 2/3 × P(heads) - (d) 5 pseudo tosses, 3 tails and 2 heads
STEP-BY-STEP SOLUTION
Step 1: Understand what α and β represent
The problem says: - $x_n = 1$ means tails - $x_n = 0$ means heads
So: - $\alpha$ = pseudo-count of ones = pseudo-count of tails - $\beta$ = pseudo-count of zeros = pseudo-count of heads
Step 2: Read off the values
- $\alpha = 3$ → 3 pseudo-tails
- $\beta = 2$ → 2 pseudo-heads
- Total = $3 + 2 = 5$ pseudo-tosses
Step 3: Match the answer
(d) says "5 pseudo tosses, 3 tails and 2 heads" — this matches exactly.
Answer: (d) ✅
Question 3b: The Likelihood $p(D|\mu)$
Options: - (a) $\binom{5}{2} \cdot \mu^5(1-\mu)^2$ - (b) $\mu^5(1-\mu)^2$ - (c) $\mu^2(1-\mu)^5$ - (d) $\mu^1(1-\mu)^4$
STEP-BY-STEP SOLUTION
Step 1: Count ones and zeros in the data
$D = {0, 1, 0, 0, 1, 0, 0}$
Let me go through each element: - Position 1: 0 - Position 2: 1 - Position 3: 0 - Position 4: 0 - Position 5: 1 - Position 6: 0 - Position 7: 0
Count: - Number of zeros ($N_0$) = 5 (positions 1, 3, 4, 6, 7) - Number of ones ($N_1$) = 2 (positions 2, 5)
Step 2: Write the likelihood formula
Step 3: Match the answer
(c) says $\mu^2(1-\mu)^5$ — matches.
Answer: (c) ✅
WHY (a) AND (b) ARE WRONG
(a) has $\binom{5}{2}$ — a binomial coefficient. The likelihood does NOT include this. The binomial coefficient appears in the Binomial distribution (which asks "what's the probability of exactly k heads in N tosses?"), not in the likelihood for μ.
(b) has the powers reversed: $\mu^5(1-\mu)^2$. This would mean 5 ones and 2 zeros, but we have 2 ones and 5 zeros.
Question 3c: The Posterior $p(\mu|D)$
Options: - (a) $\text{Beta}(\mu|4, 6)$ - (b) $\mu^4(1-\mu)^6$ - (c) $\mu^5(1-\mu)^7$ - (d) $\text{Beta}(\mu|5, 7)$
STEP-BY-STEP SOLUTION
Step 1: Apply the conjugacy rule
Prior: Beta(α=3, β=2) Data: $N_1 = 2$ ones, $N_0 = 5$ zeros
Step 2: Match the answer
(d) says Beta(μ|5, 7) — matches.
Answer: (d) ✅
WHY (a) IS WRONG
(a) says Beta(4, 6). That would come from adding 1 to each parameter, which makes no sense. You add the DATA counts, not 1.
WHY (b) AND (c) ARE WRONG
These are just the kernel $\mu^{\alpha-1}(1-\mu)^{\beta-1}$ without the normalization constant. The posterior is a PROPER Beta distribution, not just the kernel.
Question 3d: Predictive Probability
Compute the probability of throwing tails after absorbing the data.
Options: - (a) $4/11$ - (b) $3/5$ - (c) $1/2$ - (d) $5/12$
STEP-BY-STEP SOLUTION
Step 1: What are we computing?
$p(x_{next}=1|D)$ = probability the next toss is tails (=1), given all the data we've seen.
Step 2: This equals the posterior mean
Step 3: Plug in posterior parameters
Posterior = Beta(5, 7), so:
Answer: (d) ✅
SECOND EXAM WALKTHROUGH (2023, Question 3)
Coin: $x_n = 0$ (tails), $x_n = 1$ (heads) NOTE: This is FLIPPED from the previous exam! Bernoulli: $p(x_n|\mu) = \mu^{x_n}(1-\mu)^{1-x_n}$ Beta prior: $p(\mu) = \text{Beta}(\mu|\alpha=3, \beta=2)$ Data: $D = {0, 1, 1, 0, 1}$ (5 throws)
Question 3a: Likelihood
Options: - (a) $\mu^3(1-\mu)^2$ - (b) $\binom{5}{3} \cdot \mu^2(1-\mu)^3$ - (c) $\binom{5}{2} \cdot \mu^3(1-\mu)^2$ - (d) $\binom{3}{2} \cdot \mu^3(1-\mu)^2$
STEP-BY-STEP SOLUTION
Step 1: Count ones and zeros
$D = {0, 1, 1, 0, 1}$ - $N_0$ = 2 (positions 1, 4) - $N_1$ = 3 (positions 2, 3, 5)
Step 2: Write likelihood
Answer: (a) ✅ (No binomial coefficient!)
Question 3b: Posterior
Options: - (a) $\binom{5}{2} \cdot \mu^3(1-\mu)^2$ - (b) $\text{Beta}(\mu|6, 4)$ - (c) $\text{Beta}(\mu|5, 5)$ - (d) $\mu^3(1-\mu)^2 \cdot \text{Beta}(\mu|\alpha=3, \beta=2)$
STEP-BY-STEP SOLUTION
Prior: Beta(3, 2) Data: $N_1 = 3$ ones, $N_0 = 2$ zeros
Posterior: Beta(3+3, 2+2) = Beta(6, 4)
Answer: (b) ✅
Question 3c: Evidence
Options: - (a) $\frac{\Gamma(4)\Gamma(6)}{\Gamma(10)}$ - (b) $\frac{\Gamma(4)\Gamma(5)\Gamma(6)}{\Gamma(2)\Gamma(3)\Gamma(10)}$ - (c) $\frac{\Gamma(5)}{\Gamma(2)\Gamma(3)}$ - (d) $\frac{\Gamma(5)\Gamma(10)}{\Gamma(2)\Gamma(3)\Gamma(4)\Gamma(6)}$
STEP-BY-STEP SOLUTION
Step 1: Write the evidence formula
Step 2: Plug in the numbers
- $\alpha = 3$, $\beta = 2$
- $N_1 = 3$, $N_0 = 2$
- $N = 3 + 2 = 5$
- $\alpha + \beta = 5$
- $\alpha + N_1 = 3 + 3 = 6$
- $\beta + N_0 = 2 + 2 = 4$
- $\alpha + \beta + N = 5 + 5 = 10$
Step 3: Match the answer
(b) says $\frac{\Gamma(4)\Gamma(5)\Gamma(6)}{\Gamma(2)\Gamma(3)\Gamma(10)}$
Let me check: rearranging my result:
Yes, matches (b).
Answer: (b) ✅
Question 3d: Predictive Probability
Options: - (a) $\text{Beta}(0.6|6, 4)$ - (b) $0.6$ - (c) $0.7$ - (d) $0.4$
STEP-BY-STEP SOLUTION
Posterior = Beta(6, 4)
Predictive = posterior mean = $\frac{6}{6+4} = \frac{6}{10} = 0.6$
Answer: (b) 0.6 ✅
Part 4: Tricks & Shortcuts
TRICK 1: ALWAYS Check Which Outcome = 1
Different exams define it differently: - 2022: 1 = tails, 0 = heads - 2023: 1 = heads, 0 = tails
This changes which count goes to α and which to β.
TRICK 2: Likelihood Has NO Binomial Coefficient
If an option has $\binom{N}{k}$, it's wrong. The likelihood is simply $\mu^{N_1}(1-\mu)^{N_0}$.
TRICK 3: Posterior Is Always a Proper Beta Distribution
The answer should say "Beta(μ|..., ...)" not just "$\mu^a(1-\mu)^b$".
TRICK 4: Predictive = Posterior Mean = α/(α+β)
Just read off the posterior parameters and divide α by their sum.
TRICK 5: Evidence = Gamma Function Pattern
Look for this exact pattern in the options.
TRICK 6: Counting Ones and Zeros
Write the data out and count manually. Don't rush this step.
Part 5: Practice Exercises
Exercise 1
Coin: $x_n = 0$ (heads), $x_n = 1$ (tails) Beta prior: Beta(μ|α=3, β=2) Data: $D = {0, 1, 0, 0, 1, 0, 0}$
How many ones and zeros are in the data?
Exercise 2
Same data and prior as Exercise 1.
Write the likelihood $p(D|\mu)$.
Exercise 3
Same setup.
Compute the posterior $p(\mu|D)$.
Exercise 4
Same setup.
Compute the predictive probability $p(x_{next}=1|D)$ (probability of tails).
Exercise 5
Coin: $x_n = 0$ (tails), $x_n = 1$ (heads) Beta prior: Beta(μ|α=3, β=2) Data: $D = {0, 1, 1, 0, 1}$
Write the likelihood $p(D|\mu)$.
Exercise 6
Same setup as Exercise 5.
Compute the posterior $p(\mu|D)$.
Exercise 7
Same setup as Exercise 5.
Compute the evidence $p(D)$.
Exercise 8
Same setup as Exercise 5.
Compute $p(x_{next}=1|D)$ (probability of heads).